084637.011200 



AMENDMENTS TO THE CLAIMS 

1 . (Currently amended) A method for performing a supervised learning process in an 
artificial intelligence environment including optimizing a database of sample records for the 
training and testing of a prediction algorithm for predicting the presence or absence of a 
specified medical condition in a patient, the method comprising the steps of: 

defining a set of one or more distributions of the database records onto respective 
training and testing subsets; 

using the defined set of distributions to train and test a first generation set of one or 
more prediction algorithms and assigning a fitness score to each, each of said prediction 
algorithms being associated with a certain distribution of said database records; 

feeding the set of prediction algorithms to an evolutionary algorithm which generates 
a set of one or more second generation prediction algorithms and assigns a fitness score to 
each; 

continuing to feed each generational set of prediction algorithms to the evolutionary 
algorithm until a termination event occurs, where said termination event is at least one of 

a prediction algorithm is generated with a fitness score equal to or exceeding a 
defined minimum value, 

the maximum fitness score of successive generational sets of prediction 
algorithms converging to a given value, and 

a certain number of generations having been generated; 
selecting a prediction algorithm having a best fitness score; and 
using the distribution of database records associated with said selected prediction 
algorithm in performing supervised learning, said supervised learning including training and 
testing of prediction algorithms to obtain a trained prediction algorithm, for application to a 
pr e d e t e rmin e d problem, 
wherein 

said method is performed using a computer and computer software forming an 
intelligent syste m, and 

the trained prediction algorithm is effective to predict output variables for data 
relating to said condition, thereby predicting diagnosis of said condition . 
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2 . (Currently amended) The method of claim 1 characterised in that it comprises the 
following steps: 

generating a population of prediction algorithms, where each one of said prediction 
algorithms is trained and tested according to a different distribution of the records of the data 
set in the complete database onto a training data set and a testing data set; 

each different distribution being created fey as one of a random or ps e udo random 
distribution and a distribution formed by a deterministic mathematical process characterized 
as a pseudorandom distribution ; 

each prediction algorithm of the said population is trained according to its own 
distribution of records of the training set and is validated in a blind way according its own 
distribution on the testing set; 

a score reached by each prediction algorithm is calculated in the testing phase 
representing its fitness; 

an evolutionary algorithm being further provided which combines the different models 
of distribution of the records of the complete data set in a training and in a testing set which 
sets are represented each one by a corresponding prediction algorithm trained and tested on 
the basis of the said training and testing data set according to the fitness score calculated in 
the previous step for the corresponding prediction algorithm; 

the fitness score of each prediction algorithm corresponding to one of the different 
distributions of the complete data set on the training and the testing data sets being the 
probability of evolution of each prediction algorithm or of each said distribution of the 
complete data set on the training and testing data sets; 

repeating the evolution of the prediction algorithm generation for a finite number of 
generations or till the output of the genetic algorithm converges to a best solution and/or till 
the fitness value of at least some prediction algorithm related to an associated data records 
distribution has reached a desired value; 

setting the data records distribution for the best solution as the optimized training and 
testing subsets for training and testing prediction algorithm. 
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3. (Previously presented) A method according to claim 1 characterised in that to each 
record of the data set a distribution variable is associated which is binary and has at least two 
status, one of this two status being associated with the inclusion of the record in the training 
set and the other in the testing set. 

4. (Previously presented) A method according to claim 1 characterised that the 
prediction algorithm is an artificial neural network. 

5. (Previously presented) A method according to claim 1, characterised in that the 
prediction algorithm is a classification algorithm. 

6. (Previously presented) A method according to claim 1 characterised in that once 
an optimum distribution has been computed, the optimised training data subset is made equal 
to a complete data set being the individuals included in the training subset distributed onto a 
new training set and onto a new testing set each one having about the half of the records of the 
original optimized training set, while the originally optimized testing set is used as a third data 
subset for validation purposes. 

7. (Previously presented) A method according to claim 6 characterised in that the 
distribution of the data of the originally optimized training set onto the new training and new 
testing set is optimized by means of a pre-processing phase including the steps of said method 
for optimizing a database of sample records, said records being records in the originally 
optimized training set. 

8. (Previously presented) A method according to claim 1 , in which different choices 
of the structure of the training subset and the structure of the testing subset consist in different 
selections of the number of input variables of the data records of the database, which 
selections consist in leaving out at least one, preferably two or more variables from the entire 
input variable set forming each record, the records of the database comprising a certain 
number of known input variables and a certain number of known output variables. 
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9. (Previously presented) A method according to claim 8, characterised by the 
following steps: 

defining a distribution of data from the complete data set onto a training data set and 
onto a testing data set; 

generating a population of different prediction algorithms each one having a training 
and/or testing data set in which only some variables have been considered among all the 
original variables provided in the data sets, each one of the prediction algorithms being 
generated by means of a different selection of variables; 

carrying out learning and testing of each prediction algorithm of the population and 
evaluating the fitness score of each prediction algorithm; 

applying an evolutionary algorithm to the population of prediction algorithms for 
achieving new generations of prediction algorithm; 

for each generation of new prediction algorithms representing each one a new 
different selection of input variable, the best prediction algorithm according to the best 
hypothesis of input variables selection is tested or validated; 

a fitness score is evaluated and the prediction algorithms representing the selections of 
input variables which have the best testing performances and the minimum input variables are 
promoted for the processing of the new generations. 

10. (Previously presented) A method according to claim 8, further comprising a pre- 
processing phase, including the steps of said method for optimizing a database of sample 
records, for selecting the most predictive input variables. 

1 1 . (Previously presented) A method according to claim 2, 

in which different choices of the structure of the training subset and the structure of 
the testing subset consist in different selections of the number of input variables of the data 
records of the database, which selections consist in leaving out at least one, preferably two or 
more variables from the entire input variable set forming each record, the records of the 
database comprising a certain number of known input variables and a certain number of 
known output variables, 
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and further comprising a pre-processing phase, including the steps of said method for 
optimizing a database of sample records, for selecting the most predictive input variables, 

characterised in that the database subjected to the a pre-processing phase of input 
variable selection is a training subset and a testing subset processed with said method. 

1 2. (Previously presented) A method according to claim 2 characterised in that the 
complete database the distribution of the records of which has to be optimized has data 
records having a selected number of input variables, the selection being carried out with said 
method, and in which different choices of the structure of the training subset and the structure 
of the testing subset consist in different selections of the number of input variables of the data 
records of the database, which selections consist in leaving out at least one, preferably two or 
more variables from the entire input variable set forming each record, the records of the 
database comprising a certain number of known input variables and a certain number of 
known output variables. 

13. (Previously presented) A method according to claim 1 characterised in that a pre- 
processing phase for optimizing the distribution of the records on a training subset and a 
testing subset and for selecting the most predictive input variables, is carried out alternatively 
one to the other several times. 

14. (Previously presented) A method according to claim 1 characterised in that the 
evolutionary algorithm is a genetic algorithm with the following evolutionary rules: 

an average health value of the population is computed as a function of the fitness 
values of each single individual in the population; 

coupling, recombination of genes and mutation of genes are carried out in a 
differentiated manner depending on a comparison between the fitness of each individual of 
the couple and the average health value of the entire population to which the individuals 
belong; 

individuals having a fitness value lower or equal to the average health of the entire 
population are not excluded from the creation of new generations but are marked out and 
entered in a vulnerability list; 
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the number of subjects entered in the vulnerability list defining the number of possible 
marriages. 

15. (Previously presented) A method according to claim 14 in which for coupling 
purposes and for generation of children at least one parent individuals must have a fitness 
value greater than the average health value of the population. 

16. (Previously presented) A method according to claim 14, characterised in that 
each couple of individuals can generate individuals having a fitness different from the average 
health, so called offsprings if the fitness of one them, at least is greater than the average 
fitness, the offsprings of each marriage occupying the places of subjects entered in the 
vulnerability list and are marked out, so that a weak individual can continue to exist through 
his own children. 

17. (Previously presented) A method according to claim 14, characterised in that 
coupling between individuals having a very low fitness value and a very high fitness value are 
not allowed. 

18. (Previously presented) A method according to claim 14, characterised in that the 
following recombination rules of the genes of the parents individuals coupled are considered 
in the case the parents individuals have not common genes: 

the health of father and mother individuals are greater than the average health of the 
entire population; the crossover is a classical crossover according to which the genes of the 
father and of the mother individuals are substituted one with the other starting from a certain 
crossover point; 

the health of father and mother individuals are lower than the average health of the 
entire population; in this case the two children are formed through rejection of the parents 
genes they will receive by the crossover process; 

the health of one of the parents is less than the average health of the entire population 
while the health of the other parent is greater than the average health of the entire population; 
in this case only the parents whose health is greater than the average health of the entire 
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population will transmit their genes, while the genes of the parent having an health lower than 
the average health of the entire population are rejected. 

19. (Previously presented) A method according to claim 18, wherein each gene is 
characterised by a status level, the method further characterised in that genes rejection 
consists in modifying the status of the genes from one status level to a different status level. 

20. (Previously presented) A method according to claim 18, characterised in that a 
modified crossover of the genes of the parents individuals is carried out when the parents 
individuals has part of the genes that coincide, this modified crossover provides for generating 
and offspring in which the genes selected for crossover are the most effective ones of the 
parents. 

21 . (Previously presented) A method according to claim 14 in which the individuals 
are the different prediction algorithm representing a corresponding different initial random 
distribution of data records onto the testing and the training data set and the genes consist in 
the binary status variable of association of each record to the training and to the testing subset. 

22. (Previously presented) A method according to claim 14 in which the individuals 
are the prediction algorithms each one representing a different training and testing data set, 
the difference residing in a different selection of input variables for each different training and 
testing subset, and the genes consist in the different selection variable which is provided for 
each input variable in the different training and testing subsets, the above mentioned selection 
variable being a parameter indicating the presence/absence of each corresponding input 
variable in the records of each data set. 

23. (Previously presented) A method according to claim 1 characterized in that it is in 
the form of a software program comprising instructions executable by a CPU, the software 
program being stored in a memory to which the CPU can access. 
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24. (Previously presented) A software program stored on a memory device, the said 
software program consisting in the method according to claim 1 in the form of a executable 
instructions of a CPU or of a computer system. 

25. (Previously presented) A system for carrying out a method according to claim 1 
comprising an apparatus or device for generating an action of response which is 
autonomously, i.e. by itself, chosen among a certain number of different kinds of actions of 
response stored in a memory of the apparatus or autonomously generated by the apparatus 
basing the said choice of the kind of action of response on the interpretation of data collected 
autonomously by means of one or more sensors responsive to physical entities or which are 
fed to the apparatus by means of input means, the said interpretation being made by means of 
a prediction algorithm in the form of a software saved in a memory of the said apparatus and 
being carried out by a central processing unit, characterized in that the apparatus being further 
provided with means for carrying out a framing and testing phase of the prediction algorithm 
by inputting to the said prediction algorithm data of a known database in which input 
variables of the input data representing the physical entities able to being sensed by the 
apparatus through the one or more sensors and/or able to be fed to the apparatus by means of 
the input means are univoquely correlated to at least one definite kind of action of response 
among the different kinds of possible action of response, the said means for carrying out the 
training an testing being in the form of a training and testing software saved in a memory of 
the apparatus, the said training and testing being carried out by means of a method according 
to claim 1, the said training and testing software program being the said method of training 
and testing in the form of a software program or instructions. 

26. (Previously presented) The system according to claim 25, characterized in that it 
is a system for sound or vocal recognition comprising input means responsive to acoustic 
waves, a processing unit connected to the input means responsive to acoustic waves, at least a 
memory in which a software program is stored the said program being in the form according 
to claims 23 or 24 and comprising coded instructions for enabling the processing unit to carry 
out a method according to claim 1 , a further or the same above mentioned memory in which a 
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dataset of kno wn data records is stored or can be stored and/or input means for storing in the 
further or the said above mentioned memory a dataset of known data records. 

27. (Previously presented) The system according to claim 25, characterized in that it 
is a system for image recognition, the input means being responsible to electromagnetic 
waves, the system being able to recognize the shape of an object generating or reflecting 
electromagnetic waves, and/or the distance and/or the identity of the object. 

28. (Previously presented) The system according to claim 26, characterized in that 
the database of known data records comprises acoustic signals emitted by one or more objects 
or one or more living beings making part of the typical environment in which the device has 
to operate or the data relating to one or more images of one or more objects or one or more 
living beings making part of the typical environment in which the device has to operate to 
which are univoquely correlated to corresponding known kind, and/or identity and/or meaning 
of objects to which the said acoustic signals or image data are related and/or from which the 
said acoustic signals or image data are generated. 

29. (Previously presented) The system according to claim 27, characterized in that it 
is a specialized system for image pattern recognition having artificial intelligence utilities for 
analyzing a digitalized image, i.e. an image in the form of a array of image data records, each 
image data record being related to a zone or point or unitary area or volume of a two or three 
dimensional visual image, so called pixel or voxel of a visual image, the said visual image 
being formed by an array of the said pixels or voxels and utilities for indicating for each 
image data record a certain quality among a plurality of known qualities of the image data 
records; 

the system having a processing unit as for example a conventional computer, a 
memory in which an image pattern recognition algorithm is stored in the form of a software 
program which can be executed by the processing unit; 

a memory in which a certain number of predetermined different qualities which the 
image data records can assume has been stored and which qualities has to be univoquely 
associated to each of the image data records of an image data array fed to the system; 
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input means for receiving arrays of digital image data records or input means for 
generating arrays of digital image data records from an existing image and a memory for 
storing the said digital image data array; 

output means for indicating for each image data record of the image data array a 
certain quality chosen by the processing unit in carrying out the image pattern recognition 
algorithm in the form of the said software program; 

the image pattern recognition algorithm is a prediction algorithm in the form of a 
software program, which prediction algorithm is further associated to a system being further 
provided with a training and testing software program; 

the system is able to carry out training and testing according to the method of claim 1; 

the method is provided in the system in the form of the training and testing software 
program; 

a database being also provided in which data records are contained univoquely 
associating known image data records of known image data arrays with the corresponding 
kno wn quality from a certain number of predetermined different qualities which the image 
data records can assume. 

30. (Previously presented) A method for producing a microarray for genotyping 
operations, the said method comprising the steps of defining a certain number of theoretically 
relevant genes or alleles or polymorphisms considered relevant for a certain biologic 
condition like a tissue structure, a pathology or the potentiality of developing a pathology or 
an anatomic or morphologic feature: 

a) providing a database of experimentally determined data in which each record relates 
to a known clinical or experimental case of a sample population of cases and which records 
comprise a certain number of input variables corresponding to the presence/absence of a 
certain predetermined number of polymorphisms and/or mutations and/or equivalent genes of 
a certain number of theoretically probable relevant genes, said certain predetermined number 
of polymorphisms and/or genes forming a set, and one or more related output variables 
corresponding to the certain biological or pathologic condition of the said clinical and 
experimental cases of the sample population; 

characterized by the following further steps: 
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b) determining a selection of a subset of the set of certain predetermined number of 
polymorphisms and/or genes by testing the association of the said genes or polymorphisms 
and the biological or pathological condition by means of mathematical tools applied to the 
database; 

c) the said mathematical tools comprise a so called prediction algorithm such as a so 
called neural network; 

and the further steps are carried out of: 

d) dividing the database into a training and a testing dataset for training and testing the 
prediction algorithm; 

e) defining two or more different training datasets each one having records with a set 
of input variables obtained by excluding one or more input variables from the originally 
defined number of input variables, while for each record the set of input variables of the 
corresponding training set has at least one input variable which is not a member of the set of 
input variables of the other training datasets, each said at least one input variable consisting in 
a different gene or a different polymorphisms and/or a different mutation and/or a different 
functionally equivalent gene thereof of the originally considered genes or polymorphisms 
and/or mutations and/or functionally equivalent genes thereof considered theoretically 
potentially relevant for the biologic or pathologic condition; 

f) training the prediction algorithm with each of the different training sets defined 
under point e) for generating a first population of different prediction algorithm which are 
divided into two groups of mother and father prediction algorithms and testing the said 
prediction algorithms with the associated testing set; 

g) calculating a fitness score or prediction accuracy of each father and mother 
prediction algorithms of the said first population by means of the testing results; 

i) providing a so called evolutionary algorithm such a genetic algorithm and applying 
the evolutionary algorithm to the first population of mother and father prediction algorithms 
for achieving new generation of prediction algorithms whose training and testing dataset 
comprises records whose input variables selections are a combination of the input variable 
selections of the records of the training and of the testing datasets of the first or previous 
population of father and mother prediction algorithms according to the rules of the 
evolutionary algorithm; 
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j) for each generation of new prediction algorithms representing each new variant 
selection of input variables, the best prediction algorithm according to the best hypothesis of 
input variable selection is tested or validated by means of the testing dataset; 

k) a fitness score is evaluated and the prediction algorithms representing the selections 
of input variables which have the best testing performance with the minimum number of input 
variables utilized are promoted for the processing of new generations; 

I) repeating the steps i) to k) until a predetermined fitness score defined as best fit of 
the prediction algorithm and a minimum number of input variables has been reached; 

m) defining as the selected relevant input variables i.e. as the relevant genes or 
polymorphisms and/or of mutations and/or of functionally equivalent genes thereof the ones 
related to the input variables of the selection represented by the prediction algorithm having 
both at least the predetermined fitness score and also the minimum number of selected input 
variables. 

31 . (Previously presented) A method according to claim 30, characterized in that an 
optimization of the distribution of the records of the original database in a training dataset and 
in a testing dataset is carried out in one of a pre processing and a post processing phase, i.e. 
before carrying out the steps e) to m) at step d) or after having carried out the steps a) to m). 

32. (Previously presented) The method according to claim 31 comprising the 
following steps of optimisation: 

defining a set of one or more distributions of the database records onto respective 
training and testing subsets; 

using the defined set of distributions to train and test a first generation set of one or 
more prediction algorithms and assigning a fitness score to each; 

feeding the set of prediction algorithms to an evolutionary algorithm which generates 
a set of one or more second generation prediction algorithms and assigns a fitness score to 
each; 

continuing to feed each generational set of prediction algorithms to the evolutionary 
algorithm until a termination event occurs; 
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where said termination event is at least one of a prediction algorithm is generated with 
a fitness score equalling or exceeding a defined minimum value, the maximum fitness score 
of successive generational sets of prediction algorithms converging to a given value, and a 
certain number of generations having been generated. 

33. (Previously presented) The method according to claim 31, comprising the 
following steps: 

generating a population of prediction algorithm each one of them is trained and tested 
according to a different distribution of the records of the data set in the complete database 
onto a training data set and a testing data set; 

each different distribution being created by a random or pseudo-random distribution; 

each prediction algorithm of the said population is trained according to its own 
distribution of records of the training set and is validated in a blind way according its own 
distribution on the testing set; 

a score reached by each prediction algorithm is calculated in the testing phase 
representing its fitness; 

an evolutionary algorithm being further provided which combines the different models 
of distribution of the records of the complete data set in a training and in a testing set which 
sets are represented each one by a corresponding prediction algorithm trained and tested on 
the basis of the said training and testing data set according to the fitness score calculated in 
the previous step for the corresponding prediction algorithm; 

the fitness score of each prediction algorithm corresponding to one of the different 
distributions of the complete data set on the training and the testing data sets being the 
probability of evolution of each prediction algorithm or of each said distribution of the 
complete data set on the training and testing data sets; 

repeating the evolution of the prediction algorithm generation for a finite number of 
generations or till the output of the genetic algorithm converges to a best solution and/or till 
the fitness value of at least some prediction algorithm related to an associated data records 
distribution has reached a desired value; 

setting the data records distribution for the best solution as the optimized training and 
testing subsets for training and testing prediction algorithm. 
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34. (Previously presented) A microarray for genotyping comprising a reduced 
number of genes, alleles or polymorphisms characterized in that the reduced number of the 
said genes, alleles or polymorphisms has been selected by means of a method according to 
claim 30. 

35. (New) A method for performing a supervised learning process in an artificial 
intelligence environment including optimizing a database of sample records for the training 
and testing of a prediction algorithm for a problem under investigation characterized by input 
variables and output variables, the prediction algorithm used for predicting output variables 
for real world data, the method comprising the steps of: 

defining a set of one or more distributions of the database records onto respective 
training and testing subsets; 

using the defined set of distributions to train and test a first generation set of one or 
more prediction algorithms and assigning a fitness score to each, each of said prediction 
algorithms being associated with a certain distribution of said database records; 

feeding the set of prediction algorithms to an evolutionary algorithm which generates 
a set of one or more second generation prediction algorithms and assigns a fitness score to 
each; 

continuing to feed each generational set of prediction algorithms to the evolutionary 
algorithm until a termination event occurs, where said termination event is at least one of 

a prediction algorithm is generated with a fitness score equal to or exceeding a 
defined minimum value, 

the maximum fitness score of successive generational sets of prediction 
algorithms converging to a given value, and 

a certain number of generations having been generated; 
selecting a prediction algorithm having a best fitness score; 
using the distribution of database records associated with said selected prediction 
algorithm in performing supervised learning, said supervised learning including training and 
testing of prediction algorithms to obtain a trained prediction algorithm; and 
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using the trained prediction algorithm to predict the output variables relating to the 
problem under investigation where only the input variables are known, 

wherein said method is performed using a computer and computer software 
forming an intelligent system. 
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